AITopics | dawid-skene model

Neural Information Processing Systems http://nips.cc/

algorithm, annotator, dawid-skene model, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.28)
North America > United States > Florida > Alachua County > Gainesville (0.14)
North America > United States > Oregon > Benton County > Corvallis (0.04)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.68)
Information Technology > Communications > Social Media > Crowdsourcing (0.67)
(2 more...)

Add feedback

Crowdsourcing via Pairwise Co-occurrences: Identifiability and Algorithms

Neural Information Processing SystemsDec-25-2025, 22:47:39 GMT

The data deluge comes with high demands for data labeling. Crowdsourcing (or, more generally, ensemble learning) techniques aim to produce accurate labels via integrating noisy, non-expert labeling from annotators. The classic Dawid-Skene estimator and its accompanying expectation maximization (EM) algorithm have been widely used, but the theoretical properties are not fully understood. Tensor methods were proposed to guarantee identification of the Dawid-Skene model, but the sample complexity is a hurdle for applying such approaches---since the tensor methods hinge on the availability of third-order statistics that are hard to reliably estimate given limited data. In this paper, we propose a framework using pairwise co-occurrences of the annotator responses, which naturally admits lower sample complexity. We show that the approach can identify the Dawid-Skene model under realistic conditions. We propose an algebraic algorithm reminiscent of convex geometry-based structured matrix factorization to solve the model identification problem efficiently, and an identifiability-enhanced algorithm for handling more challenging and critical scenarios. Experiments show that the proposed algorithms outperform the state-of-art algorithms under a variety of scenarios.

crowdsourcing, identifiability and algorithm, pairwise co-occurrence, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Crowdsourcing Without People: Modelling Clustering Algorithms as Experts

Lorentz, Jordyn E. A., Clark, Katharine M.

arXiv.org Artificial IntelligenceOct-1-2025

This paper introduces mixsemble, an ensemble method that adapts the Dawid-Skene model to aggregate predictions from multiple model-based clustering algorithms. Unlike traditional crowdsourcing, which relies on human labels, the framework models the outputs of clustering algorithms as noisy annotations. Experiments on both simulated and real-world datasets show that, although the mixsemble is not always the single top performer, it consistently approaches the best result and avoids poor outcomes. This robustness makes it a practical alternative when the true data structure is unknown, especially for non-expert users.

algorithm, artificial intelligence, machine learning, (11 more...)

arXiv.org Artificial Intelligence

2509.25395

Country:

North America (0.29)
Europe > Austria > Vienna (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.47)

Add feedback

Crowdsourcing via Pairwise Co-occurrences: Identifiability and Algorithms

Shahana Ibrahim, Xiao Fu, Nikolaos Kargas, Kejun Huang

Neural Information Processing SystemsAug-20-2025, 01:09:37 GMT

The data deluge comes with high demands for data labeling.

algorithm, annotator, dawid-skene model, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.28)
North America > United States > Florida > Alachua County > Gainesville (0.14)
North America > United States > Oregon > Benton County > Corvallis (0.04)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.68)
Information Technology > Communications > Social Media > Crowdsourcing (0.67)
(2 more...)

Add feedback

Crowdsourcing via Pairwise Co-occurrences: Identifiability and Algorithms

Neural Information Processing SystemsOct-10-2024, 21:10:02 GMT

The data deluge comes with high demands for data labeling. Crowdsourcing (or, more generally, ensemble learning) techniques aim to produce accurate labels via integrating noisy, non-expert labeling from annotators. The classic Dawid-Skene estimator and its accompanying expectation maximization (EM) algorithm have been widely used, but the theoretical properties are not fully understood. Tensor methods were proposed to guarantee identification of the Dawid-Skene model, but the sample complexity is a hurdle for applying such approaches---since the tensor methods hinge on the availability of third-order statistics that are hard to reliably estimate given limited data. In this paper, we propose a framework using pairwise co-occurrences of the annotator responses, which naturally admits lower sample complexity.

crowdsourcing, identifiability and algorithm, pairwise co-occurrence, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Communications > Social Media > Crowdsourcing (0.65)

Add feedback

A Provably Improved Algorithm for Crowdsourcing with Hard and Easy Tasks

Kong, Seo Taek, Mandal, Saptarshi, Katselis, Dimitrios, Srikant, R.

arXiv.org Artificial IntelligenceFeb-14-2023

Crowdsourcing is a popular method used to estimate ground-truth labels by collecting noisy labels from workers. In this work, we are motivated by crowdsourcing applications where each worker can exhibit two levels of accuracy depending on a task's type. Applying algorithms designed for the traditional Dawid-Skene model to such a scenario results in performance which is limited by the hard tasks. Therefore, we first extend the model to allow worker accuracy to vary depending on a task's unknown type. Then we propose a spectral method to partition tasks by type. After separating tasks by type, any Dawid-Skene algorithm (i.e., any algorithm designed for the Dawid-Skene model) can be applied independently to each type to infer the truth values. We theoretically prove that when crowdsourced data contain tasks with varying levels of difficulty, our algorithm infers the true labels with higher accuracy than any Dawid-Skene algorithm. Experiments show that our method is effective in practical applications.

artificial intelligence, machine learning, social media, (19 more...)

arXiv.org Artificial Intelligence

2302.07393

Country:

North America > United States > New York > New York County > New York City (0.14)
Asia > Middle East > Jordan (0.04)
North America > United States > Illinois > Champaign County > Urbana (0.04)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)

Genre: Research Report (0.64)

Industry:

Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Health & Medicine > Nuclear Medicine (0.94)

Technology:

Information Technology > Communications > Social Media > Crowdsourcing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.95)

Add feedback

Crowdsourcing via Pairwise Co-occurrences: Identifiability and Algorithms

Ibrahim, Shahana, Fu, Xiao, Kargas, Nikolaos, Huang, Kejun

Neural Information Processing SystemsMar-18-2020, 23:46:51 GMT

The data deluge comes with high demands for data labeling. Crowdsourcing (or, more generally, ensemble learning) techniques aim to produce accurate labels via integrating noisy, non-expert labeling from annotators. The classic Dawid-Skene estimator and its accompanying expectation maximization (EM) algorithm have been widely used, but the theoretical properties are not fully understood. Tensor methods were proposed to guarantee identification of the Dawid-Skene model, but the sample complexity is a hurdle for applying such approaches---since the tensor methods hinge on the availability of third-order statistics that are hard to reliably estimate given limited data. In this paper, we propose a framework using pairwise co-occurrences of the annotator responses, which naturally admits lower sample complexity.

crowdsourcing, identifiability and algorithm, pairwise co-occurrence, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Communications > Social Media > Crowdsourcing (0.65)

Add feedback

Crowdsourcing via Pairwise Co-occurrences: Identifiability and Algorithms

Ibrahim, Shahana, Fu, Xiao, Kargas, Nikos, Huang, Kejun

arXiv.org Machine LearningSep-26-2019

The data deluge comes with high demands for data labeling. Crowdsourcing (or, more generally, ensemble learning) techniques aim to produce accurate labels via integrating noisy, non-expert labeling from annotators. The classic Dawid-Skene estimator and its accompanying expectation maximization (EM) algorithm have been widely used, but the theoretical properties are not fully understood. Tensor methods were proposed to guarantee identification of the Dawid-Skene model, but the sample complexity is a hurdle for applying such approaches---since the tensor methods hinge on the availability of third-order statistics that are hard to reliably estimate given limited data. In this paper, we propose a framework using pairwise co-occurrences of the annotator responses, which naturally admits lower sample complexity. We show that the approach can identify the Dawid-Skene model under realistic conditions. We propose an algebraic algorithm reminiscent of convex geometry-based structured matrix factorization to solve the model identification problem efficiently, and an identifiability-enhanced algorithm for handling more challenging and critical scenarios. Experiments show that the proposed algorithms outperform the state-of-art algorithms under a variety of scenarios.

algorithm, annotator, confusion matrix, (15 more...)

arXiv.org Machine Learning

1909.12325

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.28)
North America > United States > Florida > Alachua County > Gainesville (0.14)
North America > United States > Oregon > Benton County > Corvallis (0.04)
(2 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Communications > Social Media > Crowdsourcing (0.86)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)
(2 more...)

Add feedback

Optimal Inference in Crowdsourced Classification via Belief Propagation

Ok, Jungseul, Oh, Sewoong, Shin, Jinwoo, Yi, Yung

arXiv.org Machine LearningJan-11-2017

Crowdsourcing systems are popular for solving large-scale labelling tasks with low-paid workers. We study the problem of recovering the true labels from the possibly erroneous crowdsourced labels under the popular Dawid-Skene model. To address this inference problem, several algorithms have recently been proposed, but the best known guarantee is still significantly larger than the fundamental limit. We close this gap by introducing a tighter lower bound on the fundamental limit and proving that Belief Propagation (BP) exactly matches this lower bound. The guaranteed optimality of BP is the strongest in the sense that it is information-theoretically impossible for any other algorithm to correctly label a larger fraction of the tasks. Experimental results suggest that BP is close to optimal for all regimes considered and improves upon competing state-of-the-art algorithms.

algorithm, bayesian inference, belief revision, (19 more...)

arXiv.org Machine Learning

1602.03619

Country:

North America > United States > Illinois (0.14)
Asia > South Korea (0.14)

Genre: Research Report > New Finding (0.87)

Industry: Energy > Oil & Gas (1.00)

Technology:

Information Technology > Communications > Social Media > Crowdsourcing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

A Permutation-based Model for Crowd Labeling: Optimal Estimation and Robustness

Shah, Nihar B., Balakrishnan, Sivaraman, Wainwright, Martin J.

arXiv.org Machine LearningJun-30-2016

The aggregation and denoising of crowd labeled data is a task that has gained increased significance with the advent of crowdsourcing platforms and massive datasets. In this paper, we propose a permutation-based model for crowd labeled data that is a significant generalization of the common Dawid-Skene model, and introduce a new error metric by which to compare different estimators. Working in a high-dimensional non-asymptotic framework that allows both the number of workers and tasks to scale, we derive optimal rates of convergence for the permutation-based model. We show that the permutation-based model offers significant robustness in estimation due to its richness, while surprisingly incurring only a small additional statistical penalty as compared to the Dawid-Skene model. Finally, we propose a computationally-efficient method, called the OBI-WAN estimator, that is uniformly optimal over a class intermediate between the permutation-based and the Dawid-Skene models, and is uniformly consistent over the entire permutation-based model class. In contrast, the guarantees for estimators available in prior literature are sub-optimal over the original Dawid-Skene model.

artificial intelligence, estimator, machine learning, (20 more...)

arXiv.org Machine Learning

1606.09632

Country: